feat: Improve search indexing reliability #1065
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR came about after #740 started causing the website to crash when editing/creating Work entities.
I took this opportunity to make the search indexing more reliable by improving error catching.
While rewriting some of that code, I realized we are indexing a loooot of fields that we have no reason to index.
First of all, some internal ORM fields that are output by default when serializing to JSON.
Then the entire entity info is stored inn ElasticSearch, while on the other hand we load entities afresh from the DB based on the search results hits, which means at the moment we do absolutely nothing with all this entity info other than take disk space, slow down indexing and transferring too much data.
So we are now "cleaning" the documents before indexing them, i.e. keeping only the fields we need.
In the future, we will want to store more information and return it directly instead of fetching from the DB for presentation, but that will require more rewriting.
If we are planning on moving to SOLR soon enough that seems like a possible waste of time. A good candidate for a part 2 in any case.
Also removed some manipulation we did to shoehorn collection, editor and area ids into a "bbid" field.
Now supports "id" field as well.
Finally, since we are now passing ORM models over for indexing, some rewriting was necessary wherever we call the search indexing, and I took that opportunity to rewrite some messy chained promises to async/await syntax.